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Abstract 

The so-called factorization methods recover 3-D rigid structure from motion by factorizing an observation 
matrix that collects 2-D projections of features. These methods became popular due to their robustness — they use a 
large number of views, which constrains adequately the solution — and computational simplicity — the large number 
of unknowns is computed through an SVD, avoiding non-linear optimization. However, they require that all the 
entries of the observation matrix are known. This is unlikely to happen in practice, due to self-occlusion and limited 
field of view. Also, when processing long videos, regions that become occluded often appear again later. Current 
factorization methods process these as new regions, leading to less accurate estimates of 3-D structure. In this paper, 
we propose a global factorization method that infers complete 3-D models directly from the 2-D projections in 
the entire set of available video frames . Our method decides whether a region that has become visible is a region 
that was seen before, or a previously unseen region, in a global way, i.e., by seeking the simplest rigid object 
that describes well the entire set of observations. This global approach increases significantly the accuracy of the 
estimates of the 3-D shape of the scene and the 3-D motion of the camera. Experiments with artificial and real 
videos illustrate the good performance of our method. 

Index Terms 

3-D rigid structure from motion, complete 3-D models, matrix factorization, factorization with missing data, 
model selection, penalized likelihood, expectation-maximization, subspace approximation. 

I. Introduction 

In areas ranging from virtual reality and digital video to robotics, an increasing number of applications 
^ ; need three-dimensional (3-D) models of real- world objects. Although expensive active sensors, e.g., 
^ ■ laser rangefinders, have been frequently used to acquire 3-D data, in many relevant situations only 
•i-j ! two-dimensional (2-D) video data is available and the 3-D models have to be recovered from their 2- 
rN \ D projections. In this paper, we address the automatic recovery of complete 3-D models from 2-D video 
c3 sequences. 



A. Related work and motivation 

Since the strongest cue to infer 3-D shape from video is the 2-D motion of the image brightness 
pattern, our problem has been often referred to as structure from motion (SFM). Since using a large 
number of views, rather than simply two consecutive frames, leads to more constrained problems, thus 
to more accurate 3-D models, current research has been focused on multi-frame SFM. The factorization 
method introduced in ll38ll overcomes the difficulties of multi-frame SFM — nonlinearity and large number 
of unknowns — by using matrix subspace projections. In E8ll , the trajectories of feature points are collected 
into an observation matrix that, due to the rigidity of the 3-D object, is highly rank deficient in a 
noiseless situation. The 3-D shape of the object and the 3-D motion of the camera are recovered from 
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the rank deficient matrix that best matches the observation matrix. The work of (38ll was extended in 
several ways, e.g., geometric projection models OTll. (30ll . 3-D shape primitives (34ll . OTll . (4]], recursive 
formulation (27L uncertainty (28ll . (3, 0, 0, multibody scenario 0311 . Besides recovering 3-D rigid 
structure from video sequences, recent approaches to several other image processing/computer vision 
problems require determining linear or affine low dimensional subspaces from noisy observations. These 
problems include object recognition (51, Il39lh applications in photometry (35ll . (29ll . ifTOt and image 
alignment 03, (43ll. 

In general, such low dimensional subspaces are found by estimating rank deficient matrices from noisy 
observations of their entries. When the observation matrix is completely known, the solution to this 
problem is easily obtained from its Singular Value Decomposition (SVD) [16J. However, in practice, the 
observation matrix may be incomplete, i.e., some of its entries may be unknown (unobserved). When 
recovering 3-D structure from video, the observation matrix collects 2-D trajectories of projections of 
feature points [SI, (371, (H, (23, M, or other primitives (3l, ED, EEL E9- In real life video 
clips, these projections are not visible along the entire image sequence due to the occlusion and the limited 
field of view. Thus, the observation matrix is in general incomplete. For the specific cases we focus in this 
paper — videos that show views all around (non-transparent) 3-D objects — this incomplete characteristic 
of the observation matrix is particularly noticeable. 

The problem of estimating a rank deficient matrix from noisy observations of a subset of its entries 
deserved then the attention of several researchers. References (38ll and (24ll propose sub-optimal solutions. 
In (38lh the missing values of the observation matrix are "filled in", in a sequential way, by using SVDs 
of observed submatrices. In (24ll . the author proposes a method to combine the constrains that arise from 
the observed submatrices of the original matrix. More recently, several bidirectional optimization schemes 
were proposed, e.g., (251, ES, 03 > GO* E3, EH, Ell, and the problem was also addressed by using 
nonlinear optimization lfT3ll . 

However, none of the methods above deal with "self-inclusion", i.e., with the fact that a region that 
disappears due to self-occlusion may appear again later. When this happens, the re-appearing region is 
usually treated as a new region, i.e., as a region that was never seen before. This procedure has two 
drawbacks: i) the problem becomes less constrained than it should, thus leading to less accurate estimates 
of 3-D structure; and ii) further 3-D processing is needed to fuse the recovered multiple versions of the 
same real- world regions. 

B. Proposed approach 

We propose a global approach to recover complete 3-D models from video. Global in the sense that our 
method computes the simplest 3-D rigid object that best matches the entire set of 2-D image projections. 
This way we avoid having to post-process several partial 3-D models, each obtained from a smaller 
set of frames, or an inaccurate 3-D model obtained from the entire set of frames without detecting 
re-appearing regions. We develop a global cost function that balances two terms — model fidelity and 
complexity penalization. The model fidelity term measures the error between the model — a 3-D shape 
and a set of re-appearing regions — and the observations, as in Maximum Likelihood (ML) estimation. 
This error is simply given by the distance of a re-arranged observation matrix to the appropriate space 
of rank deficient matrices. The penalty term measures the complexity of the 3-D model, which is easily 
coded by the number of feature points used to describe the observations. By minimizing this global cost, 
we get what statisticians usually call a Penalized Likelihood (PL) lfT8ll estimate of the 3-D structure. 
Through PL estimation, re-appearing regions are then detected when the increase of the complexity of 
the 3-D model does not compensate a slightly better fit to the observations, meaning that a more complex 
3-D model would fit the observation noise rather than the 3-D real- world object. 

The minimization of the PL cost function requires the computation of rank deficient matrices from partial 
observations. In the paper, we study two iterative algorithms developed for this problem. The first algorithm 
is based in a well known method to deal with missing data — the Expectation-Maximization (EM) [|26ll . 
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Although the authors don't refer it, the bidirectional scheme in ll25ll is also EM-based. However, as detailed 
below, the EM algorithm we propose is more general and computationally simpler than the one in ll25ll . 
Our second two-step iterative scheme is similar to Wiberg's algorithm iHTTl and related to the one used 
in Il36ll to model polyhedral objects. It computes, alternately, in closed form, the row space matrix and the 
column space matrix whose product is the solution matrix. We call this the Row-Column (RC) algorithm. 

In the paper, we illustrate the behavior of both iterative algorithms with simple cases and evaluate their 
performance with more extensive experiments. In particular, we study the impact of the initialization on 
the algorithm's behavior. From these experiments, we conclude that the RC algorithm is more robust 
than EM in what respects to the sensibility to the initialization. Furthermore, the number of iterations 
needed for good convergence and the computational cost of each iteration are both smaller for RC than 
for EM. Obviously, the performances of both EM and RC improve when the initial estimate provided 
to the algorithms is more accurate. Any sub-optimal method, e.g. the ones in Il38lh E4ll . can be used to 
compute such an initialization. Our experience shows that, with a simple initialization procedure, even for 
high levels of noise and large amount of missing data, both iterative algorithms converge: i) to the global 
optimum; and ii) in a very small number of iterations. Naturally, under our global factorization approach, 
the rank deficient matrices can also be computed by using nonlinear methods such as [TT3ll . 

We use EM and RC in the context of recovering complete 3-D models from video, through the 
minimization of our global PL cost. Our experiments show that fitting the rank deficient matrix to 
the entire (re-arranged) observation matrix (which, naturally, misses several entries) leads to better 3- 
D reconstructions than those obtained by combining partial 3-D models estimated by fitting submatrices 
to smaller subsets of data (each corresponding to a subset of features that were visible in a subset of 
consecutive frames). 

C. Paper organization 

In section JI] we overview the matrix factorization approach to the recovery of 3-D structure. Section HIJ 
details the problem addressed in this paper — the direct recovery of complete 
3-D models from incomplete 2-D trajectories of features points. In section [IV] we describe our solution 
as the minimization of a global cost function. Section [V] describes the algorithms used to estimate rank 
deficient matrices from incomplete observations. In sections [VT] and IVIIi we present experiments and an 
application to video compression. Section IVIIII concludes the paper. 

Preliminary versions of parts of this work appeared in lfT9L Il20lh lt2TTl . ifTTll . 

II. Matrix Factorization for 3-D Structure from Motion 

In the original factorization method Il38lh the authors track P feature points over F frames and collect 
their 2-D trajectories in the 2F x P observation matrix W. The observation matrix is written in terms of 
the parameters that describe the 3-D structure (3-D shape and 3-D motion) as 

W = RS T + tl= [ R|t ] 

where R and t represent the rotational and translational components of the 3-D motion of the camera 
and S represents the 3-D shape of the scene. Matrix R is 2F x 3. It collects entries of the 2F 3-D rotation 
matrices that code the camera orientations. The 2F x 1 vector t collects the 2F camera positions. Matrix S 
is P x 3. It contains the 3-D coordinates of the P feature points. See [38] for the details. 

The problem of recovering 3-D structure from motion (SFM) is then: given W, estimate R, S, and t. 
Due to the specific structure of the entries of the 3-D rotation matrices in R, see for example Q, this is an 
highly non-linear problem. The factorization method uses a subspace approximation to derive an efficient 
solution to this problem. In ll38lh the authors noted that, in a noiseless situation, the 2F x P observation 
matrix W in (OQ) belongs to a restricted subspace — the space of the rank 4 matrices. In practice, due to 
the noise, although the observations matrix is in general full rank, it is well approximated by a rank 4 



+ noise, (1) 
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matrix. The method in Il38ll recovers SFM performing two separate steps. The first step computes the 
rank 4 matrix that best matches the observation matrix W. The second step performs a normalization that 
approximates the constraints imposed by the entries of the rotation matrices in R. 

To compute the rank 4 matrix that best matches the observations, most common techniques use the 
Singular Value Decomposition (SVD) of W. This requires that all entries of W are known, which means 
that the projections of the feature points to be processed were observed in all frames, i.e., that they are 
seen during the entire video sequence. In real world applications, this assumption limits severely the 
feature point candidates because very often important regions that are seen in some frames are not seen 
in others due to the scene self-occlusion and the limited field of view. 

III. Recovering Complete 3-D Models: Occlusion and Re-appearing Features 

Since in many practical situations, several regions of the 3-D scene to reconstruct do not appear in the 
entire set of images of the video sequence, the observation matrix has several unknown entries. Naturally, 
a suboptimal solution to the recovery of 3-D structure in such situations is to process separately several 
completely known submatrices of the observations matrix. However, this leads to a less constrained 
problem — the global rigidity of the scene is not appropriately modelled. 

Another aspect that must be taken into account is the fact that regions that disappear may re-appear 
later. For example, when attempting to build a complete model of a 3-D object, we must use a video 
stream containing views that completely "cover" the object, typically, a video obtained by rotating the 
camera around the object. Obviously, as the camera moves, some feature points disappear, due to object 
self-occlusion, remain invisible during certain period and then re-appear. In general, each feature point has 
thus several tracking periods. Although this is always the case when constructing complete 3-D models, 
it also happens very frequently when processing real-life videos in general. 

Even current methods that deal with occlusion, e.g., El, G3, (ED, 021 (D, [El, W, d, d, 
do not consider a re-appearing feature as another observation of a previously seen point. They rather 
consider as many feature points as tracking periods. To illustrate this point, we represent on the left image 
of Fig. Q] the typical shape of the known entries of the observation matrix W. Each feature trajectory 
is represented by a column of W. The three last columns (in gray) correspond to re-appearing features, 
i.e., they are second tracking periods of the features that were first tracked and collected in the three first 
columns. When a given feature has two tracking periods, current methods like the ones in the references 
above, return a 3-D shape containing two 3-D feature points that correspond to the same 3-D point of 
the real-world object. Although these two 3-D feature points would coincide in a noiseless situation, in 
practice they are just close to each other. To demonstrate this, we represent on the left plot of Fig. [2l the 
3-D shape recovered from a set of synthesized trajectories of 40 features, by using the method in lfT9ll . 
To simulate occlusion, three of the trajectories were artificially "interrupted", leading to three pairs of 
recovered 3-D points, marked with small circles in the left plot of Fig. [2l 




Fig. 1. Left: original observation matrix W. Right: re-arranged observation matrix Wr, after detecting re-appearing features. 

A simple way to detect re-appearing features is based on a local analysis of the distance between 
recovered 3-D points. However, this procedure is hard in practice due to the sensitivity to the threshold 
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Fig. 2. 3-D shape recovered from the original matrix W, i.e., without detecting re-appearing features (left) and from the re-arranged 
matrix Wjj, i.e., after detecting re-appearing features (right). In the left plot, three feature points are erroneously recovered as 3-D point 
pairs (marked with small circles). See in Fig. Q] the representation of the known entries of W and Wr. 

below which the features would be considered to correspond to the same 3-D point. In fact, the distance 
between the features that correspond to the same 3-D point, depends not only on the noise level but also 
on the camera-object distance, which is very difficult to estimate accurately enough. The detection of 
re-appearing features must then be based on a global approach. 

Our approach in this paper is then to re-arrange the observation matrix W into a smaller matrix W R 
that merges all the tracking periods of the same feature in the same column. For the observation matrix W 
shown on the left side of Fig. [H the re-arranged matrix W R would be as shown on its right. Finding 
matrix W R is equivalent to detecting the re-appearing features. Note that the statements of section [D] 
remain valid for the re-arranged matrix W R ; in particular, W R is rank 4 in a noiseless situation, just like 
the original observation matrix W. As pointed out before, the advantage of using W R rather than W is that 
the SFM problem becomes more constrained, thus leading to more accurate estimates of the 3-D structure. 



We formulate the detection of re-appearing features as a model selection problem, where a model is 
represented by a re-arranged observation matrix W r . Matrix W r determines the number of feature points 
of the 3-D model, P r , and the correspondences between columns of the original observation matrix W 
and points of the 3-D real-world object. To select the best model W#, we propose a global approach — we 
seek the simplest 3-D rigid object that describes the entire set of (incomplete) feature trajectories. 

A. Penalized likelihood cost 

In our approach, re-appearing features are detected through the minimization of a global cost-function 
that takes into account both the accuracy of the model and its complexity. Thus, the re-arranged matrix W R 
is given by 



where E Sa (W r ) measures the error of the model W r as its distance to the space of rank 4 matrices, 
i.e., it is a likelihood term, and P r is the number of feature points of the model, i.e., it codes the model 
complexity (matrix W r is 2F x P r ). The parameter a balances the two terms. Naturally, the minimization 



IV. Global Approach: Penalized Likelihood Estimation 
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in © is constrained to the set of matrices that can be found by re-arranging the original observations 
matrix W. We denote this set by 7Z(W). 

The likelihood term E Sa (W r ) in © models the object rigidity. As outlined in section [III in a noiseless 
situation, the matrix collecting the trajectories of points of a rigid object is rank 4. This term is then 
naturally given by the distance of the matrix W r to the space of the 2F x P r rank 4 matrices. Since not 
all the entries of W r are known, we measure this distance by masking the unknown entries: 



E Sa (W r ) = min (w r - w) © M. 



(3) 

wes 4 1 v 7 F 

where ||.|| F represents the Frobenius norm, which arises from assuming that the observations are corrupted 
by white Gaussian noise, <S 4 denotes the space of the 2FxP r rank 4 matrices, represents the elementwise 
product, also known as the Hadamard product, and the matrix M r is a binary mask that accounts for the 
known entries of the matrix W r , i.e., the entry of M r is = 1 if the entry of W r is known 
and rriij = otherwise. 

In Bayesian inference approaches, the term aP r in © is considered to be a prior and the mini- 
mization © leads to the Maximum a Posteriori (MAP) estimate [fTTll (the cost function in © is the 
negative log posterior probability). Penalization terms, such as aP r in ©, have also been introduced 
in the model selection literature through information-theoretic criteria like Akaike's AIC [32] or the 
Minimum Description Length (MDL) principle [8j. Naturally, different basic principles lead to different 
choices for the balancing parameter a but the structure of the cost function is most often as in ©. This 
generic form © is usually called by the statisticians a penalized likelihood (PL) cost function [fT8l . In 
our case, to find a valid range for the weight parameter a, we performed several experiences. By testing 
with pertinent values for the number of features, number of images, noise level, % of missing data, and 
number of re-appearing features, we concluded that there is a wide range of values for a that leads to a 
good balance between maximizing the number of correct detections of re-appearing features (probability 
of detection) and minimizing the number of incorrect detections (probability of false alarm). 

B. Minimization algorithm 

To find candidates W r e 7Z(W), our algorithm starts by selecting from the original observation 
matrix W, pairs of columns that can be merged with each other. Naturally, these pairs correspond to 
disjoint tracking periods. For example, for the matrix W in the right side of Fig. Q] each one of the six 
last columns could be merged with the first column, therefore, in what respects to the feature corresponding 
to the first column, there are seven possible situations: it could either have re-appeared, generating one 
of the trajectories of the six last columns, or remain occluded for the remaining of the video. 

To obtain a computationally feasible algorithm, we prune the search — our algorithm decides by com- 
paring the costs of merging each selected pair of columns with the one of considering that there are not 
re-appearing features, i.e., of the model W r = W. The process is then repeated until the cost © does 
not decrease by merging any selected pair of columns. The search could be further pruned, e.g., using the 
local approach outlined in the previous section to guide the process, thus testing only pairs of columns 
corresponding to feature points that are close in 3-D. 

What remains to be addressed is the computation of the likelihood term Es 4 (W r ), given by expres- 
sion ©. The minimization in © requires finding the rank 4 matrix that best matches the incompletely 
known matrix W r . There is not known closed-form solution for this problem. In the following section, 
we develop iterative algorithms that compute such rank deficient approximations in very few iterations. 
The likelihood term £s 4 (W r ) is then computed as 

S 54 (W r ) = |(w r -Wr)©M 
where W r is the rank 4 matrix that best matches the known entries of the matrix W r , i.e. 



(4) 

F 



W r = arg min 

we<s 4 



W r — W ) M 



(5) 
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The iterative algorithms we describe in the sequel converge in very few iterations to W r , when adequately 
initialized. In the appendix, we describe a strategy to compute such initialization. We reduce the compu- 
tational burden of the global approach by performing this initialization step only once, when testing the 
model that corresponds to the original observation matrix, W r = W, and using the same initial guess 
when testing the other models. 



V. Estimation of rank deficient matrices with Missing Data 

We now address the estimation of the rank deficient matrix W r given by expression (|5]). To gain insight 
and introduce notation, let us first consider the case where the 2F x P r matrix W r is completely known, 
i.e., rriij = 1, V^- ^ M r = 3-2FxP r - In this case, © is simplified to 



W r = arg mm 
we<s 4 



W r - W 



(6) 



The solution W r of © is known — it is obtained by selecting the 4 largest singular values, and the 
associated singular vectors, from the Singular Value Decomposition (SVD), W r = USV, of the 2F x 
P r matrix W r [16]. We denote this optimal rank reduction, i.e., the projection onto <S 4 , by W r |<S 4 : 

Wr = W r |5 4 = U 2Fx4 S 4x4 V 4xPr . (7) 

When matrix W r misses a subset of its entries, the estimation of W r leads to the minimization of (|5]), 
a generalized version of (|6]). The existence of unknown entries in W r prevents us to use the SVD of W r 

as in ©. In the following subsections we introduce two iterative algorithms that minimize ®. Both 

— (o) 

algorithms are initialized with a guess W r , computed as described in the appendix. 



A. Expectation-Maximization algorithm 

The Expectation-Maximization (EM) approach to estimation problems with missing data works by 
enlarging the set of parameters to estimate — the data that is missing is jointly estimated with the other 
parameters. The joint estimation is performed, iteratively, in two alternate steps: i) the E-step estimates 
the missing data given the previous estimate of the other parameters; ii) the M-step estimates the other 
parameters given the previous estimate of the missing data, see [26] for a review on the EM algorithm. 

. -(0) 

In our case, given the initial estimate W r , the EM algorithm estimates in alternate steps: i) the 
missing entries of the input matrix W r ; ii) the rank 4 matrix W r that best matches the "complete" data. 
The algorithm performs these two steps until convergence, i.e., until the error measured by the Frobenius 
norm in © stabilizes. 

(fc-i) 

1 ) E-step — estimation of the missing data: Given the previous estimate W r , the estimates of the 

r . - (fc-1) 

missing entries {wij : m^=0j of W r are simply the corresponding entries of W r . We then 

build a complete observation matrix W r , whose entry equals the corresponding entry of the 
matrix W r if was observed or its estimate if is unknown, 

w l3 = { W J> * m - = I (8) 
13 \ Wij if my =0, 



or, in matrix notation, 



w; v = w r 0M r + w; © 



(9) 
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2 ) M-step — estimation of the rank deficient matrix: We are now given the complete data matrix W r , 

~{k) OA 

with the estimates of the missing data from the E-step. The rank 4 matrix W r that best matches W r 
in the Frobenius norm sense, is then obtained from the SVD of as in ©, 

VK (k) =W~ r (k) lS 4 . (10) 

In reference [|25lh the authors develop a bidirectional algorithm to factor out an observation matrix 
with missing data, in the context of recovering rigid SFM. Their bidirectional algorithm is in fact an EM 
algorithm developed to the specific strategy of treating the 3-D translation separately. In opposition, the 
EM algorithm just described is general, i.e, it solves any rank deficient matrix approximation problem 
with missing data. Furthermore, our E-step in © is simpler than the corresponding step of E511 that 
requires inverting matrices. 



B. Row-Column Algorithm 

We now describe the Row-Column (RC) algorithm — another iterative approach, similar to Wiberg's 
algorithm [HTT] . to the estimation of a rank deficient matrix that best matches an incomplete observation. 
From our experience, summarized in section [VH the RC algorithm is not only computationally cheaper 
than EM, avoiding SVD computations and exhibiting faster convergence, but also more robust than EM 
to initializations far from the solution. 

For the RC algorithm, we parameterize the 2F x P r rank 4 matrix W in © as the product 



W 



L 2 Fx4 B 4x p r 



e S A 



(11) 



where A determines the column space of W and B its row space. The rank 4 matrix W r that best 
matches W r is obtained by minimizing the cost function in © with respect to (wrt) the column space 
and row space matrices, i.e., 



W. = AB, 



where 



{A,B} 



arg mm 

A,B 



W r - AB © M. 



(12) 



By using the re-parameterization (TTHl we map the constrained minimization © wrt Wg5 4 into the 

unconstrained minimization (fT2l) wrt A and B. 

We minimize (fT2l) in two alternate steps: i) the R-step assumes the column space matrix A is known and 

estimates the row space matrix B; ii) the C-step estimates B for known A. The algorithm is initialized by 

f s - — -(o) 

computing A^ Uj from the initial estimate W r and it runs until the value of the norm in (1121) stabilizes. 

When there is no missing data, i.e., when M r = 1 in (fT2l) . the R- and C- steps above are linear least 

squares (LS) problems whose solutions are obtained by using the Moore-Penrose pseudoinverse lfT6lh 



-i) T A (k-i) 



A (k- 



A<*>=W r B ( * )T (B<*>BW 



(13) 



If we write steps R and C together as a recursion on one of the matrices A or B, say, on the column 
space matrix A, we get 

-l 



A« = W.WjA^" 1 ) (AM T W.WjAM) * AM" AM 



(14) 

which shows that, in the no missing data case, our RC algorithm is in fact implementing the application 
of the power method lfT6ll to the matrix W r W^ (the factor (A T W r W;fA)~ 1 A T A is the normalization). 
The power method has been widely used to avoid the computation of the entire SVD when fitting rank 
deficient matrices to complete observations. We will see that, even when there is missing data, both R- 
and C- steps admit closed-form solution and the overall algorithm generalizes the power method in a very 
simple way. 
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1) R-step — estimation of the row space: For known A, the minimization of (fT2l) wrt B can be rewritten 
in terms of each of the P r columns {b p ,p = 1, . . . , P r } of B, 



arg mm 



Ab v ] m z 



(15) 



where w p and m p denote the p th columns of W r and M r , respectively. Exploiting the structure of the 
binary column vector m p , we now re-arrange the minimization in (fT5l) in such a way that its solution 
becomes obvious. First, we note that m p in (fT51) is just selecting the entries of the error vector (w p — Ab p ) 
that affect the error norm. Then, by making explicit that selection in terms of the entries of w p that contain 
known data and the corresponding relevant entries of A, we rewrite (IT3I) as 



bp arg min 

hp 



w p © m p - (A © m p li x4 ) 



(16) 



Note that m p li x4 in (IT6l) is simply a 2Fx4 matrix with all 4 columns equal to m p . 

The minimization in (IT6l) is now clearly expressed as a linear LS problem. Its solution b n is then 
obtained by using the pseudoinverse of matrix (A m p li x4 ), 



b p = 



(A © m p li x4 ) T (w p © mp) , 



(17) 



(A © m p li x4 ) (A © m p li x4 ) 

which is simplified by omitting repeated binary maskings, 

b p = [A T (A © m p l lx4 )] _1 A T (w p m p ) . (18) 

The set of P r estimates {b p ,p = 1, . . . , P r } as in (TTSl) generalizes the well known pseudoinverse LS so- 
lution in (fT3l) to problems with missing data. 

2) C-step — estimation of the column space: Given B, the estimate of each row a/ of the column space 
matrix A is obtained by proceeding in a similar way as in the R-step. We get, for each / = 1, . . . , 2F, 

a/ = (w f © m/ ) B T [(B l4xim f ) B T ] _1 , (19) 
where in this case, for commodity, lowercase boldface letters denote rows rather than columns. 



VI. Experiments — rank reduction with missing data 

Since approximating rank deficient matrices from incomplete observations is a key step in recovering 
3-D structure, we start by describing experiments that illustrate the behavior of the EM and RC algorithms 
just described and demonstrate their good performance. 



A. 2x2 matrices 

We start by a simple case that allows an illustrative graphical representation — approximating 2x2 
rank 1 matrices from observations W r that miss one of the entries. In this case, the error measured by 
the Frobenius norm in expression^ © and (fT2l) . can be expressed in terms of a single parameter 9. In 
fact, let the 2 x 2 rank 1 matrix W be written in terms of its column and row spaces as in (fTTT) . 

W = a 2xl b lx2 e Sl (20) 

Without loss of generality, impose that the row vector b has unit norm and write it in terms of a row 
angle 9, 

b = [ cos9 sin9 ] . (21) 

Now denote the minimum of (fT2l) wrt the column space a for fixed row space b, i.e., for fixed 9, by 
a(W r , 9), given by (fT9l) . The error in © and (IT2l) is then rewritten as a function of 9 as 

error(#) = || (W r - a(W r , 0) [ cos9 sin9 ]) 0M r || F . (22) 
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Note that for any set of three entries of a 2 x 2 matrix, there is always a value for the forth entry that 
makes the rank of the matrix equal to one, i.e., there is always a 2 x 2 rank 1 matrix W that fits exactly 
the observed entries of W r . Thus, we have min error (0) = 0. 

The examples in the sequel illustrate the impact of the initialization on the behavior of the EM and RC 
algorithms with experiments that use the following ground truth rank 1 matrix W, observation mask M r , 
and observation W r : 



W 



-1 -1.95 
2 3.9 



1 1 
1 



-1 -1.95 

2 ? 



(23) 



where "?" represents the unobserved entry w 22 of the observation matrix W r 



B. Typical good behavior of EM and RC 
Using the initial guess 



- — -(o) 
W r = 



-1.95 




(24) 



-(*) 



we describe the evolution of the estimates W r of the rank 1 matrix by plotting two equivalent 

representations of W r : i) its row space b = [bi,b 2 ]; and ii) the corresponding angle 9 as defined 
in (12TT) . The left plot of Fig. [3] shows the level curves of the error function as function of the row vector 
b = [&i, b 2 ], superimposed with the evolution of the estimates of b for EM (dashed line) and RC (solid 
line). In this plot, the dotted line (optimum) are the row vectors that lead to zero estimation error. The 
right plot of Fig. [3] represents the same error function, now as a function of 9, as defined in (l22l) . (dotted 
line) superimposed with the locations of the 9 estimates for EM (dashed line) and RC (solid line). 




0.5 1 1.5 2 



RC 

O (final) 

- - EM 

O (final) 

— error 



Fig. 3. Typical good behavior. Left: trajectories of the estimates of the iterative algorithms EM and RC. Right: error function. 

From the left plot of Fig. [3l we see that both EM and RC trajectories start at the same point (due to 
the equal initialization) and converge to points in the optimal line. As expected, the EM estimates of the 
row space vector have constant unit norm (due to the normalization in the SVD) while the RC estimates 
don't. The good behavior of the algorithms is confirmed by the right plot of Fig. [3] that shows that both 
algorithms converge to a value of 9 that makes error (9) = 0, i.e., that minimizes error (9). 

To evaluate the convergence speed, we show in Fig. |4] the evolution of the estimation error along the 
iterative process for both algorithms (in the left plot, linear scale, in the right one, logarithmic scale). 
From Fig. HI we see that RC converges in a smaller number of iterations than EM. Our experience with 
larger matrices in practical applications have confirmed the faster convergence of RC. 
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Fig. 4. Typical good behavior. Left: estimation error. Right: the same in logscale. 



C. Large entries in rows or columns that miss data 

We now illustrate a drawback of the EM algorithm. Using the same data and the initial estimate 

-1 



(o) 

W r = 



-1.95 
22 



(25) 



we get the first 1000 iterations of Fig. [51 which is as described above for Fig. [3l 




■ RC 

(final) 
EM 
(final) 
error 



1 1.2 1.4 1.C 
angle 9 



Fig. 5. Large initialization. Left: EM and RC trajectories. Righterror function. 

From the left plot of Fig. O we see that, while RC converges to the optimal line, the estimates given 
by the EM almost don't change along the iterative process. This can also be seen in the right plot of 
Fig. [5l which shows that RC converges to the 6 such that error(#) = 0, while EM, after 1000 iterations 
is still far from argmin error (0). The left plot of Fig. [6] shows the evolution of the estimation errors. See 
that while the error of RC converges to zero in a few iterations, the error of EM almost doesn't decrease 
during the first 100 iterations. 

The bad behavior of EM is due to the large initial guess for w 2 2 (note that is large when compared 

with the known entries of W r ). Remember that EM starts by estimating the rank 1 matrix that best matches 

(o) (o) 

the initial guess W r . This rank 1 matrix is the matrix associated with the largest singular value of W r , 

which is highly constrained by the large jspurious entry w$ = 22(note that while the singular values of the 

ground truth matrix W are 0"i(W) ~ 4.9 and 



a 2 (W) 



<0) 



(OK 



0, the singular values of the initial guess W r v ' are <7i(W r ) ~ 22.2 and 
: 0.8 due to the large entry w<$ = 22). Then, EM replaces the known entries of W r in the 
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Fig. 6. Large initialization. Left: estimation error. Right: largest singular value of the estimate at the k— th iteration. 



new estimate (obtaining thus an estimate that is very close to the initial guess) and repeats the process. To 

better illustrate this very slow convergence of EM, we represent, in the right plot of Fig. [6l the evolution 

- — <k) 

of the largest singular value of the estimate W r for both algorithms. We see that while, as expected, 
the Jargest singular value of the RC estimates converges to the largest singular value of the solution W, 

oi (W) ~4.9, the largest singular value of the first 100 iterations of the EM estimates changes very slowly 

- — -(°) 

from its initial value <7i(W r )~22.2. 

The behavior just described also happens in situations other than the initial guesses of the unknown 
entries being too large when compared to the other entries. In fact, we observed the same behavior in 
situations where the observation matrix had large entries in rows or columns that contained missing entries. 
In these situations, due to those large entries, even small values for initial guess of the unknown entries 
had large impact on the row and column singular vectors associated with the large singular values that 
determined the best rank deficient approximations to the complete data matrices involved in EM. This 
lead to the same kind of very slow convergence illustrated in Fig. [5] and Fig. [6l 



D. Large matrices 

We now present results of approximating matrices likely to appear when processing real video sequences. 
We tested the EM and RC algorithms with noisy partial observations of rank 4 matrices. We used matrices 
with dimensions ranging from 2 x 2 to 200 x 200. The percentage of missing entries ranged from 10% 
to 80%. We studied the behavior of the EM and RC algorithms in terms of the observation noise power 
and the amount of missing data, and the impact of the initialization. 

We initialized both EM and RC with random values for the unknown entries. To illustrate the influence 
of the initialization on the convergence of the algorithms, Fig. [7] shows the percentage of experiments 
that converged (to an estimate close enough to the ground truth) in less than 100 iterations, as a function 
of the mean value of the random guesses for the unknown entries. Although representative for the entire 
range of experiments done, the results in the plots of Fig. [7] were obtained with noisy observations of 
24 x 24 rank 4 matrices that missed 70% of their entries. The left plot of Fig. [7] shows three lines, each 
obtained by using EM with data generated with a ground truth matrix whose elements had mean 0.001, 
1, and 1000. The percentages of convergence for the RC algorithm are in the right plot. 

The left plot of Fig. [7] shows that the EM algorithm converges almost always if the mean value of the 
initial guesses for the missing entries is smaller than the mean value of the observed entries. When we 
increase the values of the initial estimates of the missing entries, the percentage of convergence decreases 
abruptly, becoming close to zero when those values become much larger than the ones of the observed 
entries. This is in agreement with the behavior illustrated in the example of Fig. In opposition, the 
right plot of Fig. [7] shows that the behavior of the RC algorithm is somewhat independent of the order of 



13 




Fig. 7. Percentage of convergent experiments as a function of the mean value of the initial random guesses for the unknown entries (ground 
truth matrices whose entries have mean 10 -3 , 1, and 10 3 ). Left: EM algorithm. Right: RC algorithm. 



magnitude of the initialization. The experiments that lead to a non-convergent behavior of RC were such 
that the matrices whose inverse is computed in (IT9l) and (TTSl) were close to singular. We thus conclude 
that it is very important in practical applications to provide good initial estimates for both EM and RC 
algorithms. 

Finally, we note that the relevance of a good initialization goes behind avoiding non-convergent behavior. 
In fact, we observed that both the amount of missing data and the noise level have strong impact on the 
algorithm's convergence speed. Thus, when dealing with large percentages of missing entries and high 
levels of noise, as it may arise in practice, a better initialization not only improves the chance of a 
convergent behavior but also leads to a faster convergence. 



E. Heuristic initialization 

(o) 

We now use an initial guess W r obtained by combining the column and row spaces given by the 
SVDs of the known submatrices of W r as described in the appendix. In our tests, with this simple 
initialization procedure, 100% of the runs of EM and RC converged to the ground truth matrix (mean 
square estimation error smaller than the observation noise variance) in a very small number of iterations, 
typically less than 10, even for high levels of noise. 

To illustrate tlte impact of a better initialization, we use a noisy observation W r of a 40 x 40 rank 
deficient matrix W with 900 missing entries (observation noise variance a 2 = 1). Figure [8] shows plots of 
the evolution of the estimation error, measured by the Frobenius norm, for both the EM and RC algorithms. 
While for the left hand side plot we used a random initialization, for the one in the right we used the 
complete procedure, i.e., we initialized the process by using the method described in the appendix. Since 



(W - W r ) © M r 



aV40 2 - 900 ~ 26.5, we 



the error for the ground truth matrix W is given by 

see from the plots of figure [8] that both the EM and RC algorithms provide good estimates. From the 
right hand side plot, we conclude that the initial guess provided by the heuristic method in the appendix 
enables a much faster convergence (in 2 or 3 iterations) to a solution with lower error. 



F. Sensitivity to the noise 

The plot of Fig. [9] represents the average estimation error ger entry after 20 iterations of EM and RC 
algorithms, for noisy observations of a 24 x 24 rank 4 matrix W, with 70% missing data, as a function of 
the observation noise standard deviation. We see that the average entry estimation errors after 20 iterations 
are below 10" 8 for noise standard deviation ranging from 10 -25 to 10 25 (the mean value of the entries 
of the ground truth matrix W is 1). This shows that, with a simple initialization procedure, both EM 
and RC algorithms converge to the optimal solution, even for very noisy observations. Furthermore, we 
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Fig. 8. Evolution of the estimation error for a 40 x 40 rank deficient matrix with 30 x 30 missing entries, with a random initialization (left 
plot) and with the heuristic initialization described in the appendix (right plot). 



conclude that the main impact of the observation noise is on the EM and RC convergence speeds — the 
slightly higher average error values on the right region of the plot of Fig. [9] indicates that the estimates 
were still converging to the optimal solution after 20 iterations. 




Fig. 9. Estimation errors for the algorithms EM and RC as functions of the noise standard deviation. 



G. Sensitivity to the missing data 

A relevant issue is the robustness of the rank reduction algorithms to the missing data. Our experience 
showed that the structure of the binary mask matrix M r representing the known data is more important 
than the overall percentage of missing entries of W r . When recovering 3D structure from video, feature 
points enter and leave the scene in a continuous way, thus the typical structure of M r is as represented 
in the left plot of Fig. [H or, when considering re-appearing features, as represented in the right plot of 
the same Figure. We tested the EM and RC algorithms with several mask matrices M r with this typical 
structure, with the heuristic initial guess. In these experiments, the algorithms converged always to the 
optimal solution in a very small number of iterations, independently of the percentage of missing data. 

Similarly to the impact of the noise discussed above, the percentage of missing entries has impact on 
the convergence speed. In fact, we concluded that the larger is the portion of the matrix that is observed, 
i.e., the smaller is the percentage of missing data, the lower is the estimation error after a fixed number 
of iterations, i.e., the faster is the convergence of the iterative algorithms. 
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H. Computational cost 

As referred above, the EM and RC algorithms converge in a very small number of iterations when 
initialized by the heuristic procedure described in the appendix. We now report an experimental evaluation 
of the computational costs of each iteration of EM and RC as functions of the observation matrix 
dimension. 

We used TV x 24 observation matrices with missing data corresponding to a (TV — 4) x 20 submatrix. The 
plots in Fig. [TOl represent the number of MatLab© floating point operations (FLOPS) and the computation 
time per iteration, as functions of N. From the left plot, we see that the number of FLOPS per iteration of 
the EM algorithm is larger than one of the RC algorithm. Furthermore, the FLOPS count for EM increases 
exponentially with TV (due to the SVD computation) while for RC it increases linearly with N. Thus, 
although the computation times in the right plot of Fig. [10] are smaller for EM than for RC (the reason 
being the very efficient MatLab© implementation of the SVD), we conclude that RC is computationally 
simpler than EM. RC is even as simple as the methods to deal with complete matrices, since the most 
efficient way to compute the SVD is to use the power method [TT6l of which RC is a simple generalization. 




Fig. 10. Computational cost of each iteration of the algorithms EM and RC. Left: number of MatLab® floating point operations (FLOPS). 
Right: computation time. 

VII. Experiments — 3-D models from video 

We now describe experiments that illustrate our method to recover 3-D rigid shape and 3-D motion 
from video. Our results show: i) how the iterative algorithms of section [V] cope with video sequences with 
occlusion; and ii) how the global method we propose recovers complete 3-D models. The experiments use 
both synthesized and real video sequences. We also describe an application of the recovered 3-D models 
to video compression. 

A. Self-occlusion — ping-pong ball video sequence 

We used a real-life video clip available at Il22ll . This clip shows a rotating ping-pong ball with painted 
dots. The left image of Fig. QT] shows the first of the 52 video frames of the ball sequence. Due to 
the rotating motion, the ball occludes itself. However, it does not complete a turn, thus there are not 
re-appearing regions. 

We used simple correlation techniques to track a set of 64 feature points (these techniques are adequate 
for the short baseline video sequences we used but more sophisticated methods to cope with wide baselines 
are available in the literature Il33l0 . Due to the camera-ball 3D rotation, the region of the ball that is 
visible changes along time, leading to an observation matrix with ~ 41% missing entries. We used the 
RC algorithm to estimate the rank 4 matrix that best matched the observation and computed the 3-D 
structure by proceeding as in the factorization method of If38l . 
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In the right image of Fig. [TH we represent the recovered 3-D shape, which shows that our method 
succeeded in recovering the spherical surface of the ball. 




Fig. 11. Ping-pong ball video sequence. Left: first frame. Right: recovered 3-D shape. 



B. Complete 3-D models 

We now illustrate our method to recover complete 3-D models from video by minimizing a global 
cost function as described in section [IV] We processed the observation matrix that lead to the left plot 
of Fig. [2l The 3-D shape recovered by our global method is shown in the right plot, where each of the 
pairs of re-appearing features have been correctly detected as representing the same 3-D point. 

Fig. IT2l describes an experiment with an object described by a larger number of 3-D feature points. We 
synthesized noisy versions of the 2-D trajectories of 372 feature points located on the 3-D surface of a 
rotating cylinder. Then, we simulated occlusion and inclusion by removing significant segments of those 
trajectories. The left plot of Fig. IT2l shows one of the 50 synthesized frames. The small circles denote the 
noiseless projections and the points denote their noisy version, i.e., the data that is observed. Note that 
only an incomplete view portion of the cylinder is observed in each frame. 




Fig. 12. Cylinder image sequence. Left: one synthetic frame. Right: recovered 3-D shape. 

The data from the cylinder sequence was then collected on a 100 x 372 observation matrix W with 
9537 unknown entries (~ 26% of the total number). We used the global factorization algorithm to find the 
re-arranged matrix W#, i.e., to recover simplest 3-D rigid structure that could have generated the data. 
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The right plot of Fig. [12] plots the final estimate of the 3D shape. We see that the complete cylinder is 
recovered. Due to the incorporation of the rigidity constraint, the 3D positions of the features points are 
accurately estimated, even in the presence of very noisy observations (compare the plots in Fig. fT2l) . 

C. Error analysis 

We quantified the gain of using our method to recover SFM by measuring the 3-D reconstruction error. 
We synthesized noisy trajectories of 40 features randomly located in 3-D, 15 of them being "interrupted" 
to simulate object self-occlusion, leading to a 60x55 observation matrix W with 53.9% known entries 
(x— and y— coordinates in [0, 120] x [0, 160], noise standard deviation a = 3). Our method generated a 
60 x 40 re-arranged matrix W R with 74.2% known entries. By using W R rather then W to recover 
SFM, we reduced the 3-D shape estimation error by approximately 50% and the 3-D motion error by 
approximately 70%. 

In order to evaluate the sensitivity of our method to the observation noise, we plotted in Fig. [13] the 
probabilities of correctly detecting re-appearing features and the probability of incorrect detections (false 
alarms) as functions of the noise level, for the experimental setup described above. These probabilities 
were estimated from 100 runs for each level of noise. Although re-appearing features become harder to 
detect as the noise level increases, our method correctly detects more than 90% of them when the noise 
level is below a = 5. The plot of Fig. [13] also shows that our method has approximately zero false alarms 
for noise levels below a = 7. Note that the x— and y— coordinates are in [0, 120] x [0, 160], so this level 
of performance seems to indicate that our method is able to cope with the noise in trajectories likely to 
happen when processing the majority of real videos. 
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Fig. 13. Probability of detection and probability of false alarm. 



D. Real video — Rubik's cube 

This video — see two representative frames on the left and middle images of Fig. Q3] — shows a Rubik's 
cube. The video sequence was obtained by rotating a hand-held camera around a table with the cube. 
In the leftmost image of Fig. [14l we superimposed with the video frame the visible features and the 
initial parts of their trajectories. Due to the cube self-occlusion, feature points enter and leave the scene. 
In particular, along the video sequence, all features disappear and several of them re-appear because the 
camera performs a complete turn around the cube. 

To emphasize the advantages of using the algorithms described in this paper, we first applied to a 
segment of the Rubik's cube video clip the original factorization method of ll38ll for complete matrices, 
obtaining the 3D shape represented on the left side of Fig. Q31 This model was obtained with 28 features 
and 18 frames. 
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We then collected the entire set of the visible parts of the trajectories of 64 features across 85 frames 
in a 170x64 incomplete observation matrix W. The structure of the missing part of W is coded by the 
170 x 64 binary mask M represented on the right side of Figure IT4l The number of missing entries in W 
was about 62%. We computed the re-arranged matrix W R by using our global factorization. The right 
image of Fig. [15] shows the texture-mapped 3-D shape recovered from W R . 




Fig. 15. Texture mapped 3D shape recovered from the Rubik's cube video clip in Fig. [T4j Left: incomplete model obtained by using the 
factorization method of Tomasi and Kanade [1]. Right: complete shape recovered by our global factorization. 

This example illustrates how our global approach trivializes the usually hard task of merging partial 
estimates of a 3D model like the one in the left image of Fig. [151 Note that the top face is missing also 
in the right image of Fig. Q3] because the position the cube model is shown in that image was not seen in 
the video sequence. The advantage of using our approach to recover 3-D rigid structure is two-fold. First, 
while recovering a complete 3-D model by fusing partial models as the one on the left side of Fig. Q3] 
is a complex task, our method recovers directly the complete model shown on the right side of Fig. [151 
Second, rather than processing subsets of the set of features and frames at disjoint steps, our method uses 
all the information available in a global way, leading to more accurate 3-D shapes as illustrated by the 
3-D models in Fig. [151 

E. Video compression 

The 3D models recovered by our method can be used to represent in an efficient way the original video 
sequence as proposed in Q — the video sequence is represented by the 3-D shape, texture, and 3-D motion 
of the objects. This leads to significant bandwidth saving since once the 3-D shape and texture of the 
objects have been transmitted, their 3-D motion is transmitted with a few bytes per frame. 

We used this methodology to compress the entire sequence of 2161 frames of the Rubik's cube video 
clip. The compression ratio (relative to the original J-PEG compressed frames) was approximately 10 3 . 
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Fig. [16] shows sample original frames (top row) and the corresponding compressed frames (bottom row). 
The differences of lighting between the top and bottom images are due to fact that the constancy of the 
texture of the 3-D model does not model the different light reflections that happen in the real scenario. 
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Fig. 16. 3-D model-based digital video compression example. Top row: original frames. Bottom row: compressed frames. Compression 
ratio approx. 10 3 . 



VIII. Conclusion 

We proposed a global approach to build complete 3-D models from video. The 3-D model is in- 
ferred as the simplest rigid object that agrees with all the observed data, i.e., with the entire set of 
video frames. Since a key step in this global approach is the approximation of rank deficient matrices 
from incomplete observations, we developed two iterative algorithms to solve this non-linear problem: 
Expectation-Maximization (EM) and Row-Column (RC). Our experiments showed that both algorithms 
converged to the correct estimate whenever initialized by a simple procedure and that the RC algorithm 
is computationally cheaper and more robust than EM. Our experiments show that the proposed global 
method leads to more accurate estimates of 3-D shape than those obtained by either processing several 
smaller subsets of views or processing the entire video without taking into account re-appearing regions. 

Appendix 

(0) 

This appendix addresses the computation of a rank 4 initial guess W r from an incomplete observa- 

(o) 

tion W r . We find W r by computing, in an expedite way, guesses of its column space matrix A and 
its row space matrix B, from the data in W r : 

W; (0) = A 2Fx4 B 4x p r G S 4 . (26) 



A. Subspace approximation 

Before addressing the general case, we consider the simpler case where a number of columns of W r 
are entirely known, i.e., a number of feature points are present in all frames, and a number of rows 
of W r are entirely known, i.e., and a number of frames contain all feature projections. We collect those 
known columns in a submatrix and those known rows in a submatrix W B . From the data in W^, 
we estimate the column space matrix A as 

A = W A |5 4 . (27) 
From W B and the column space A, the LS estimate of the row space matrix B is, see lfT6lh 

B= {A T B A B y l A T B -W B) (28) 
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where A B collects the rows of A that correspond to the rows of W B . We see that the matrix A^A# must 
be nonsingular so the matrix must have at least 4 linearly independent columns and the matrix W B 
must have at least 4 linearly independent rows. 



B. Subspace combination 

In general, it may be impossible to find 4 entire columns and 4 entire rows without missing elements 
in the matrix W r . We must then estimate the column and row space matrices A and B by combining 
the spaces that correspond to smaller submatrices of W r . We describe the algorithm for combining two 
column/row space matrices. The process is then repeated until the entire matrix W r has been processed. 

Select from W two submatrices Wi and W 2 that have at least 4 columns and 4 rows without missing 
elements. We factorize Wi and W 2 using (1271) to obtain the corresponding column space matrices Ai 
and A 2 . If the observation matrix W is in fact well modelled by a rank 4 matrix, the submatrices 
of Ai and A 2 that correspond to common rows in W, denoted by A 12 and A 21 , are related by a linear 
transformation, 

Ai 2 ~ A 21 N 4X 4. (29) 

We compute N as the LS solution of (1291) and assemble a combined column space matrix A for the rows 
corresponding to Wi and W 2 : 



N=(A^A 2 i) 1 A21 A12, and 



A 2V N 



(30) 



where A 2 \i denotes the submatrix of A 2 that collects the rows that do not correspond to rows of Wi. 
We compute the combined row space matrix B by using an analogous procedure. 
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